Category: Analysis
Daniel Mercer
Share
Listen On

A new study has found that modern AI systems can complete only a small fraction of real-world freelance project tasks. Research conducted by Scale AI in collaboration with the Center for AI Safety suggests that tools such as ChatGPT, Gemini, and Claude are still far from fully replacing professionals in fields like design, programming, and data analysis, according to The Washington Post.

Researchers tested the models on hundreds of actual freelance projects, ranging from 3D animation and web game development to scientific paper formatting. The results were sobering: the best-performing system successfully completed just 2.5% of tasks. Nearly half of the projects were delivered at low quality, while about one-third were left unfinished altogether. In many cases, AI systems produced corrupted files, ignored key client requirements, or delivered outputs that appeared plausible at first glance but contained critical flaws upon closer inspection.

In interior design tests, for example, AI generated floor plans that looked realistic but were technically incorrect and lacked essential detail. Data analysis tasks also exposed weaknesses: when asked to build a happiness data dashboard, the AI overlaid text on charts, mixed up color schemes, and omitted entire countries. In game development, one model produced a functioning game but completely ignored the assigned theme—delivering an abstract concept instead of a brewing simulation.

Jason Hausenloy, one of the study’s authors, attributed the poor performance to two core limitations: the lack of long-term memory and weak visual understanding. Current chatbots do not learn from mistakes over extended projects and struggle to incorporate client feedback. When generating 3D models, AI relies on code rather than visual interfaces, often leading to unstable or inconsistent imagery.

That said, progress is measurable. Gemini 3 Pro, released in November 2025, completed 1.3% of tasks—up from 0.8% achieved by its previous version.

The broader trend toward AI autonomy continues. Companies are already finding that a single employee equipped with AI tools can be more productive than before. However, fully replacing skilled professionals remains firmly in the realm of science fiction. While the cost difference is striking—building a game with a human developer averaged $1,485, compared with under $30 using Claude Sonnet—the quality gap still makes human labor indispensable.

AI Research Contributor
Daniel Mercer is an AI research contributor specializing in large language models, benchmarking, and multimodal systems. He writes about model capabilities, limitations, and real-world performance across leading AI assistants and platforms.

Recent Podcasts

AI as a Role Model for Generation Alpha: Promise, Risks, and the Future of Childhood

Artificial intelligence is becoming the main role model for Generation Alpha. 2026 may mark a...

AI as a Toy: Why Humanity Always Misuses New Technology First

Artificial intelligence could, in theory, help solve all of humanity’s problems. Stop wars, cure...

Why the AI Boom Is Not a Bubble but the Foundation of a New Economy

The situation in global markets has become especially interesting. U.S. tech giants have posted...